Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret

نویسندگان

  • Alina Beygelzimer
  • Francesco Orabona
  • Chicheng Zhang
چکیده

We present an efficient second-order algorithm with Õ( 1 η √ T ) regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by η, for a range of η restricted by the norm of the competitor. The family of loss functions ranges from hinge loss (η = 0) to squared hinge loss (η = 1). This provides a solution to the open problem of (J. Abernethy and A. Rakhlin. An efficient bandit algorithm for √ T -regret in online multiclass prediction? In COLT, 2009). We test our algorithm experimentally, showing that it also performs favorably against earlier algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Online Bandit Multiclass Learning with Õ(√T) Regret

We present an efficient second-order algorithm with Õ( 1 η √ T )1 regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by η, for a range of η restricted by the norm of the competitor. The family of loss functions ranges from hinge loss (η = 0) to squared hinge loss (η = 1). This provides a solution to the...

متن کامل

Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in [AR09], which is parameterized by a scalar α. We prove that the regret of NEWTRON is O(log T ) when α is a constant that does not vary with horizon T , and at mostO(T ) if α is allowed to increase t...

متن کامل

Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization

We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O∗( √ T ) regret. The setting is a natural generalization of the nonstochastic multi-armed bandit problem, and the existence of an efficient optimal algorithm has been posed as an open problem in a number of recent papers. We show how the difficulties encountered by...

متن کامل

An Efficient Algorithm for Bandit Linear Optimization

We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O∗( √ T ) regret. The setting is a natural generalization of the non-stochastic multi-armed bandit problem, and the existence of an efficient optimal algorithm has been posed as an open problem in a number of recent papers. We show how the difficulties encountered b...

متن کامل

An Efficient Bandit Algorithm for sqrt(T) Regret in Online Multiclass Prediction?

Consider a sequence of examples (xt, yt) for t = 1, . . . , T where xt ∈ R and yt ∈ [K], where the goal of a Learner is to predict the class yt from the input xt. In the more common full-information setting, the Learner observes the true class yt after making her prediction ŷt. In the present open problem, however, we will consider the so-called bandit setting: after predicting ŷt, the Learner ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1702.07958  شماره 

صفحات  -

تاریخ انتشار 2017